Documentation for the Morphology Lab.

MS, 2004-2007

 

This is a very minimal documentation. You too can contribute to the NooJ project by writing a real documentation, with exercises, in your own language !

 

Following are the morphological operators that are available to any language:

 

<B>: keyboard Backspace

<C>: change Case

<D>: Duplicate current char

<E>: Empty string

<L>: keyboard Left arrow

<N>: go to end of Next word form

<P>: go to end of Previous word form

<R>: keyboard Right arrow

<S>: delete/Suppress current char

 

The following morphological operators can have an number argument: <B>, <L>, <N>, <P>, <R>, <S>:

xx number: repeat xx times

W: whole word

 

For example:

<B2>: delete the last two characters

<L3>: go left 3 times

<R4>: go right 4 times

<S5>: delete the next 5 characters

<BW>: delete from the current character all the way to the first character

<LW>: go to beginning of current word form

<RW>: go to the end of the current word form

<SW>: delete all the following characters of the current word form

 

Language-Specific morphological operators for Catalan, Español, Français, ελληνικά and Português:

 

<Á>: add acute accent

<À>: add grave accent

<Ä>: add dieresis

<Â>: add circumflex

<A>: remove Accent

 

Example of a Language-Specific morphological operator for עברית:

 

<F>: Finalize current letter if not a final letter; unFinalize current letter if letter is final

<G>: insert daGesh if current letter is begadkefat

<H>: insert atef-pataH if current letter is gutural, or shwa

<M>: delete current letter, take dagesh and shin/sin dot into account

 

There are also specific morphological operators for العربية : <M>, <T> and <Z>.

 

Commands are suffixes to be added to the lemma in order to produce the inflected form:

 

cousin [es] => cousines

 

Morphological operator <B> (“Backspace”) is used to DELETE characters from the lemma:

 

voler [<B>] => vole

voler [<B>a] => vola

cheval [<B>ux] => chevaux

recordman [<B3>women] => recordwomen

 

One can delete all the letters of the lemma in order to build an irregular inflected form:

 

avoir [<B5>ont] => ont

avoir [<BW>ont] => ont

 

More complex commands are used in order to make operations as general as possible. For instance, the three following verbs inflect with the same command, hence, they can be associated with the same inflectional paradigm:

 

lever [<L3><B>è<R2><S>nt] => lèvent

mener [<L3><B>è<R2><S>nt] => mènent

semer [<L3><B>è<R2><S>nt] => sèment

 

The <N> and <P> commands are used to inflect Multi-Word Units (or “compounds”). <N> moves the cursor to the end of the next component of the MWU; <P> moves the cursor to the end of the previous component:

 

cousin germain [e<P>e] => cousine germaine

cousin germain [s<P>s] => cousins germains

cousin germain [es<P>es] => cousines germaines